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SOME PEOPLE HAVE ALL THE LUCK 


RICHARD ARRATIA, SKIP GARIBALDI, LAWRENCE MOWER, AND PHILIP B. STARK 


Abstract. We look at the Florida Lottery records of winners of prizes worth 
$600 or more. Some individuals claimed large numbers of prizes. Were they 
lucky, or up to something? We distinguish the “plausibly lucky” from the 
“implausibly lucky” by solving optimization problems that take into account 
the particular games each gambler won, where plausibility is determined by 
finding the minimum expenditure so that if every Florida resident spent that 
much, the chance that any of them would win as often as the gambler did 
would still be less than one in a million. Dealing with dependent bets relies 
on the BKR inequality; solving the optimization problem numerically relies on 
the log-concavity of the regularized Beta function. Subsequent investigation 
by law enforcement confirmed that the gamblers we identified as “implausibly 
lucky” were indeed behaving illegally. 
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It is unusual to win a lottery prize worth $600 or more. No one we know has. 
But ten people have each won more than 80 such prizes in the Florida Lottery. 
This seems fishy. Someone might get lucky and win the Mega Millions jackpot (a 
l-in-259 million chance) having bought just one ticket. But it’s implausible that a 
gambler would win many unlikely prizes without having bet very many times. 

How many? We pose an optimization problem whose answer gives a lower bound 
on any sensible estimate of an alleged gambler’s spending: over all possible com¬ 
binations of Florida Lottery bets, what is the minimum amount spent so that, if 
every Florida resident spent that much, the chance that any of them would win so 
many times is still less than one in a million? If that amount is implausibly large 
compared to that gambler’s means, we have statistical evidence that she is up to 
something. 

Solving this optimization problem in practice hinges on two math facts: 
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• an inequality that lets us bound the probability of winning dependent bets 
in some situations in which we do not know precisely which bets were made. 

• log-concavity of the regularized Beta function, which lets us show that any 
local minimizer attains the global minimal value. 

We conclude that 2 of the 10 suspicious gamblers could just be lucky. The other 8 
are chiseling or spending implausibly large sums on lottery tickets. These results 
were used by one of us (LM) to focus on-the-ground investigations and to sup¬ 
port an expose of lax security in the Florida lottery [17]. We describe what those 
investigations found, and the policy consequences in Florida and other states. 

1. How LONG CAN A GAMBLER GAMBLE? 

Is there a non-negligible probability that a pathological gambler of moderate 
means could win many $600+ prizes? If not, we are done: our suspicion of these 
10 gamblers is justified. 

So, suppose a gambler starts with a bankroll of So and buys a single kind of 
lottery ticket over and over again. If he spends his initial bankroll and all his 
winnings, how much would he expect to spend in total and how many prizes would 
he expect to collect before going broke? 

Let the random variable X denote the value of a ticket, payoff minus cost. We 
assume that 

(1) E(X) < 0, 

because that is the situation in the games where our suspicious winners claimed 
prizes. (It does infrequently happen that lottery tickets can have positive expec¬ 
tation, see [12] or [1].) Assumption (1) and the Law of Large Numbers say that a 
gambler with a finite bankroll eventually will run out of money, with probability 1. 
The question is: how fast? 

Write c > 0 for the cost of the ticket, so that 

(2) P(A > -c) = 1 and P(A = -c) ^ 0. 

To illustrate our assumptions and notation, let’s look at a concrete example of a 
Florida game, Play 4. It is based on the numbers or policy game formerly offered 
by organized crime, described in [14] and [20]. Variations on it are offered in most 
states that have a lottery. 

Example 1.1 (Florida’s Play 4 game). Our ten gamblers claimed many prizes in 
Florida’s Play 4 game, although in 2012 it only accounted for about 6% of the 
Florida Lottery’s $4.45 billion in sales. Here are the rules, simplified in ways that 
don’t change the probabilities. 

The Lottery draws a 4-digit random number twice a day. A gambler can bet 
on the next drawing by paying c = $1 for a ticket, picking a 4-digit number, and 
choosing “straight” or “box.” 

If the gambler bets “straight,” she wins $5000 if her number matches the next 4- 
digit number exactly (which has probability p = 10 -4 ). She wins nothing otherwise. 
The expected value of a straight ticket is E(X) = $5000 x 10 -4 — $1 = —$0.50. 

If a gambler bets “box,” she wins if her number is a permutation of the digits in 
the next 4-digit number the Lottery draws. She wins nothing otherwise. The prob¬ 
ability of winning this bet depends on the number of distinguishable permutations 
of the digits the gambler selects. 
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For instance, if the gambler bets on 1112, there are 4 possible permutations, 1112, 
1121, 1211, and 2111. This bet is a “4-way box.” It wins $1198 with probability 
1/2500 = 4 x 10 -4 , since 4 of the 10,000 equally likely outcomes are permutations of 
those four digits. If the gambler bets on 1122, there are 6 possible permutations of 
the digits; this bet is called a “6-way box.” It wins $800 with probability 6 x 10 -4 . 
(The 6-way box is relatively unpopular, accounting for less than 1% of Play 4 
tickets.) Buying such a ticket has expected value E(A) rs —$0.52. Similarly, there 
are 12-way and 24-way boxes. 

Returning to the abstract setting, the gambler’s bankroll after t bets is 
St ■= S 0 + X\ + X 2 + • • • + Xt, 

where X ±,..., X t are i.i.d. random variables with the same distribution as X, and 
Xi is the net payoff of the i-th ticket. The gambler can no longer afford to keep 
buying tickets after the Tth one, where T is the smallest t > 0 for which St < c. 

Proposition 1.2. In the notation of the preceding paragraph, 

S °~ c < Em < S ° — 

|E(A)| V ; - |E(X)|’ 

with equality on the right if So and all possible values of X are integer multiples of 

c. 


In most situations, So is much larger than c, and the two bounds are almost 
identical. In expectation, the gambler spends a total of cE(T) on tickets, including 
all of his winnings, which amount to cE(T) — So- 

Proof. By the definition of T and (2), 

(3) 0 < E(S t ) < c 

with equality on the left in case So and X are integer multiples of c. Now the crux 
is to relate E(T) to E(St). If T were constant (instead of random), then T = ET 
and we could simply write 

ET 

(4) E(St)=E(So + ^X 4 ) = So + E(T)E(X) 

i=l 

and combining this with (3) would give the claim. The key is that equation (4) 
holds even though T is random — this is Wald’s Equation (see, e.g., [7, §5.4]). The 
essential property is that T is a stopping time , i.e., for every k > 0, whether or 
not one places a fc-th bet is determined just from the outcomes of the first k — 1 
bets. □ 

You might recognize that in this discussion that we are considering a version of 
the gambler’s ruin problem but with an unfair bet and where the house has infinite 
money; for bounds on gambler’s ruin without these hypotheses, see, e.g., [8]. 

A ticket with just one prize. The proposition lets us address the question from 
the beginning of this section. Suppose a ticket pays j with probability p and nothing 
otherwise; the expected value of the ticket, E(A) = pj — c, is negative; and j is an 
integer multiple of c. If a gambler starts with a bankroll of So and spends it all on 
tickets, successively using the winnings to buy more tickets, then by Proposition 
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1.2 the gambler should expect to buy E(T) = Sq/(c — pj) tickets, which means 
winning 

cE(T) - Sp = pS 0 
3 c - pj ' 

prizes. 

Example 1.3. How many prizes might a compulsive gambler of “ordinary” means 
claim? Surely some gamblers have lost houses, so let us say he starts with a bankroll 
worth Sq = $175,000, an amount between the median list price and the median sale 
price of a house in Florida [24]. If he always buys Play 4 6-way box tickets and 
recycles his winnings to buy more tickets, the previous paragraph shows that he 
can expect to win about 

pSo/(c — pj) = 6 x 17.5/0.52 « 202 times. 

This is big enough to put him among the top handful of winners in the history of 
the Florida lottery. 

Hence, the number of wins alone does not give evidence that a gambler cheated. 
We must take into account the particulars of the winning bets. 


2. A TOY VERSION OF THE PROBLEM 


From here on, a “win” means a win large enough to be recorded; for Florida, 
the threshold is $600. Suppose for the moment that a gambler only buys one kind 
of lottery ticket, and that each ticket is for a different drawing, so that wins are 
independent. Suppose each ticket has probability p of winning. 

A gambler who buys n tickets spends cn and, on average, wins np times. This 
is intuitively obvious, and follows formally by modeling a lottery bet as a Bernoulli 
trial with probability p of success: in n trials we expect np successes. 

We don’t know n, and the gambler is unlikely to tell us. But based on the 
calculation in the preceding paragraph, we might guess that a gambler who won W 
times bought roughly W/p tickets. Indeed, an unbiased estimate for n is n \= W/p , 
corresponding to the gambler spending cn on tickets. Since p is very small, like 
1CT 4 , the number h is big—and so is the estimated amount spent, cn. (Note that 
this estimate includes any winnings “reinvested” in more lottery tickets.) 

A gambler confronted with h might quite reasonably object that she is just very 
lucky, and that the true number of tickets she bought, n, is much smaller. Un¬ 
der the assumptions in this section, her tickets are i.i.d. (independent, identically 
distributed) Bernoulli trials, and the number of wins W has a binomial distribu¬ 
tion with parameters n and p , which lets us check the plausibility of her claim by 
considering 


(5) D(n;w,p):= 


probability of at least w 
wins with n tickets 


p k (l-p) 


n—k 


Modeling a lottery bet as a Bernoulli trial is precisely correct in the case of 
games like Play 4. But for scratcher games, there is a very large pool from which 
the gambler is sampling without replacement by buying tickets; as the pool is much 
larger than the values of n that we will consider, the difference between drawing 
tickets with and without replacement is negligible. 
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Example 2.1 (Louis Johnson). Of the 10 people who had won more than 80 
prizes each in the Florida Lottery, the second most-frequent prize claimant was 
Louis Johnson. He claimed W = 57 $5,000 prizes from straight Play 4 tickets (as 
well as many prizes in many other games that we ignore in this example). We 
estimate that he bought n = W/p = 570,000 tickets at a cost of $570,000. 

What if he claimed to only have bought n = 175,000 tickets? The probability of 
winning at least 57 times with 175,000 tickets is 

.0(175000; 57,10“ 4 ) « 6.3 x 10~ 14 . 

For comparison, by one estimate there are about 400 billion stars in our galaxy 
[13]. Suppose there were a list of all those stars, and two people independently 
pick a star at random from that list. The chance they would pick the same star is 
minuscule, yet it is still 40 times greater than the probability we just calculated. 
It is utterly implausible that a gambler wins 57 times by buying 175,000 or fewer 
tickets. 


3. What this has to do with Joe DiMaggio 

The computation in Example 2.1 does not directly answer whether Louis Johnson 
is lucky or up to something shady. The most glaring problem is that we have 
calculated the probability that a particular innocent gambler who buys $175,000 
of Play 4 tickets would win so many times. The news media have publicized some 
lottery coincidences as astronomically unlikely, yet these coincidences have turned 
out to be relatively unsurprising given the enormous number of people playing the 
lottery; see, for example, [5, esp. p. 859] or [21] and the references therein. 

Among other things, we need to check whether so many people are playing 
Play 4 so frequently that it’s reasonably likely at least one of them would win at 
least 57 times. If so, Louis Johnson might be that person, just like with Mega 
Millions: no particular ticket has a big chance of winning, but if there are enough 
gamblers, there is a big chance someone wins. 

We take an approach similar to how baseball probability enthusiasts attempt to 
answer the question, Precisely how amazing was Joe DiMaggio? Joe DiMaggio is 
famous for having the longest hitting streak in baseball: he hit in 56 consecutive 
games in 1941. (The modern player with the second longest hitting streak is Pete 
Rose, who hit in 44 consecutive games in 1978.) One way to frame the question is 
to consider the probability that a randomly selected player gets a hit in a game, 
and then estimate the probability that there is at least one hitting streak at least 
56 games long in the entire history of baseball. If a streak of 56 or more games is 
likely, then the answer to the question is “not so amazing”; DiMaggio just happened 
to be the person who had the unsurprisingly long streak. If it is very unlikely that 
there would be such a long streak, then the answer is: DiMaggio was truly amazing. 
(The conclusions in DiMaggio’s case have been equivocal, see the discussion in [11, 
pp. 30-38].) 

Let’s apply this reasoning to Louis Johnson’s 57 Play 4 wins (Example 2.1). 
Suppose that N gamblers bought Play 4 tickets during the relevant time period, 
each of whom spent at most $175,000. Then an upper bound on the probability 
that at least one such gambler would win at least 57 times is the chance of at 
least one success in N Bernoulli trials, each of which has probability no larger than 
p « 6.3 x 10 -14 of success. (Louis Johnson represents a success.) The trials might 
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not be independent, because different gamblers might bet on the same numbers for 
the same game, but the chance that at least one of the N gamblers wins at least 
57 times is at most Np by the Bonferroni bound (for any set of events Ai, ... ,An, 

nuiVb)<£tiP(^))- 

What is N? Suppose it’s the current population of Florida, approximately 19 mil¬ 
lion. Then the chance at least one person would win at least 57 times is no larger 
than 19 x 10 6 x 6.3 x 10 -14 = 0.0000012, just over one in a million. 

This estimate is crude because the estimated number of gamblers is very rough 
and of course the estimate is not at all sharp (it gives a lot away in the direction of 
making the gambler look less suspicious) because most people spend nowhere near 
$175,000 on the lottery. We are giving even more away because Louis Johnson won 
many other bets (his total winnings are, of course, dwarfed by the expected cash 
outlay). Considering all these factors, one might reasonably conclude that either 
Louis Johnson has a source of hidden of money—perhaps he is a wealthy heir with 
a gambling problem—or he is up to something. 

Example 3.1 (Louis Johnson 2). In Example 2.1 we picked the $175,000 spending 
level almost out of thin air, based on Florida house prices as in Example 1.3. Instead 
of starting with a limit on spending and deducing the probability of a number of 
wins, let’s start with a probability, e = 5 x 10 14 , and infer the minimum spending 
required to have at least that probability of so many wins. 

If Johnson buys n tickets, then he wins at least 57 times with probability 
D{n ; 57,10 -4 ). We compute no, the smallest n such that 

D(n; 57,10” 4 ) > e, 

which gives no = 174,000. Using the Bonferroni bound again, we find that the 
probability, if everyone in Florida spent $174,000 on straight Play 4 tickets, the 
chance that any of them would win 57 times or more is less than one in a million. 


4. Multiple kinds of tickets 


Real lottery gamblers tend to wager on a variety of games with different odds of 
winning and different payoffs. Suppose they place b different kinds of bets. (It might 
feel more natural to say “games,” but a gambler could place several dependent bets 
on a single Play 4 drawing: straight, several boxes, etc.) 

Number the bets 1,2,..., b. Bet i costs Ci dollars and has probability pi of 
winning. The gambler won more than the threshold on bet i Wi times. We don’t 
know m, the number of times the gambler wagered on bet i. If we did know the 
vector n = (ni, • ■ •, rib), then we might be able to calculate the probability: 


( 6 ) P{n-,w,p ) 


/probability of winning at least Wi times on bet i 
l with rii tickets, for all i 


As in Example 3.1, we can find a lower bound on the amount spent to attain Wi 
wins on bet i, i = 1,..., b, by solving 


(7) c-n* = mmc-n s.t. n, > Wi and P(n;w,p)>£. 

n 

For a typical gambler that we study, this lower bound c • n* will be in the millions 
of dollars. Thinking back to the “Joe DiMaggio” justification for why (7) is a lower 
bound, it is clear that not every resident of Florida would spend so much on lottery 
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tickets, and our gut feeling is that a more refined justification would produce a 
larger lower bound for the amount spent. 

But how can we find P(n; w,p )? If the different bets were on independent events 
(say, each bet is a different kind of scratcher ticket), then 


( 8 ) P(n\w,p) =Y[ 

i= 1 


f probability of winning at least w, 
\ times on bet i with rii tickets 


= W_D(ni-, w i,pi). 


i=1 


But gamblers can make dependent bets, in which case (8) does not hold. For¬ 
tunately, it is possible to derive an upper bound for the typical case, as we now 
show. 


5. No DEPENDENT WINS IS ALMOST AS GOOD AS INDEPENDENT BETS 

For most of the 10 gamblers, we did not observe wins on dependent bets, such 
as a win on a straight ticket and a win on a 4-way box ticket for the same Play 4 
drawing. We seek to prove Proposition 5.1 (below), which says that if there were 
no wins on dependent bets, then treating the bets as if they were independent gives 
an overall upper bound on the probability P in (6). 

Abstractly, we envision a finite number d of independent drawings, such as a 
sequence of Play 4 drawings. For each drawing j, j = 1,..., d , the gambler may 
bet any amount on any of b different bets (such as 1234 straight, 1344 6-way box, 
etc.), whose outcomes- for drawing j —may be dependent, but whose outcomes on 
different draws are independent. We write Pi for the probability that a bet on i 
wins in any particular drawing; pi is the same for all drawings j. 

For i = 1,..., b and j = 1,..., d, let riij £ {0,1} be the indicator that the 
gambler wagered on bet i in drawing j , so that itb row sum, m := J2j n iji th e 
total number of bets on i. We call the entire system of bets B , represented by the 
b-by-d zero-one matrix B = [n, 3 ]. 

Proposition 5.1. Suppose that, for each i, a gambler wagers on bet i in ni different 
drawings, as specified by B, above. Given the bets B, consider the events 

Wi := (gambler wins bet i at least Wi times with bets B), 
and the event 

I := (in each drawing j, the gambler wins at most one bet). 

b 

P(/n Wi n---n w b ) < JJp(Wi). 

2=1 

In our case, P (Wi) = D(m m ,Wi,pi), so we restate (9) as: 

b 

(10) P(/nlbin--.nlf i )<[]%;w„ ft ). 

2=1 

Proposition 5.1 is intuitively plausible: even though the bets are not independent, 
the drawings are, and event I guarantees that any single drawing helps at most one 
of the events {Wi} to occur. We prove Proposition 5.1 as a corollary of an extension 
of a celebrated result, the BKR inequality, named for van den Berg-Kesten-Reimer, 
conjectured in [23], and proved in [19] and [22] (or see [3]). The remainder of this 
section provides the details. The original BKR inequality is stated as Theorem 5.6. 


Then 

( 9 ) 
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We separate the purely set-theoretic aspects of the discussion, in Section 5a and 
5c, from the probabilistic aspects, in Section 5d. 

5a. The BKR operation □. Let S be an arbitrary set, and write S d for the 
Cartesian product of d copies of S. Since our application is probability, we call an 
element ui = (wi,..., u>d) G S d an outcome , and we call any A C S d an event. 

For a subset J C {l,...,d} and an outcome w G S d , the J-cylinder ofu>, denoted 
Cyl(J, w), is the collection of oj' G S d such that w' = ojj for all j G J. For events 
A ll A 2 ,..., A let A 1 □ A 2 □ ■ ■ ■ □ A/, C S d be the set of to for which there exist 
pairwise disjoint J\, Ji, ■ . •, Jb Q {1, • • • d} such that Cyl( Jj, w) C A* for all i. The 
case b = 2, where one combines just two events, is the context for the original BKR 
inequality as in [23, p. 564]; the operation with b > 2 is new and is the main study 
of this section. 

Here is another definition of □ that might be more transparent. Given an event 
A C S d and a subset J C {1,..., d}, define the event 

[A]j ■■= {w G A | Cyl( J, w) C4}= H Cyl( J, w). 

{ oj | Cy 1 (J, uj ) C A } 

Informally, [A\j consists of the outcomes in A, such that by looking only at the 
coordinates indexed by J, one can tell that A must have occurred. Evidently, for 
A,BC S d , 

(11) A C B implies Aj C Bj and J C K implies Aj C Ak- 
The definition of □ becomes: 

( 12 ) □ Ai~ (J [ a i]ji n [A 2 ]j 2 n • • • n [A r ) Jb . 

l<i<6 pairwise disjoint Ji, . . . , Jb C {1, . . . , d} 

We read the above definition as “Di <i<bAi is the event that all b events occur, 
with b disjoint sets of reasons to simultaneously certify the b events.” Informally, 
the outcome w, observed only on the coordinate indices in Ji, supplies the “reason” 
that we can certify that event Ai occurs. 

Our notation Gi <i<bAi = A\ □ A 2 □ • ■ ■ □ Af, is intentionally analogous to the 
notations for set intersection, Oi<i<bAi — -^l fl A^ 0 • • ■ PI Ab, and set union, 
Ui<i<b Ai = A\ U A 2 U • • ■ U Ab. The multi-input operator [] is, like set intersection 
f) and set union (J, fully commutative, i.e., unchanged by any re-ordering of the 
inputs. Unlike intersection and union, □ is not associative, as we now show. 

Example 5.2. Take S = {0,1}, d = 3, and 

A = (0, *, *) U (1,0, *), B = (0, *, *) U (1,1, *), C — (*, 0,1), 

where we write for example (1, 0, *) = {(1,0, 0), (1, 0,1)} = Cyl({l, 2}, (1, 0, s)) for 
s = 0,1 and (0, *, *) = {(0,0,0), (0,0,1), (0,1,0), (0,1,1)}. Note that \A\ = \B\ = 6. 
Then A □ B = (0, *, *), (A □ B) □ C = {(0,0,1)} — using J\ = {1} and J 2 = {2,3} 
in (12) — but B □ C = {(0,0,1)} and A □ (B □ C) = 0. Also, AO B □ C = 0. 

5b. The connection between lottery drawings and □. Before continuing to 
discuss the BKR operation □ in the abstract, we consider what it means for lottery 
drawings. We take S = 2 b to encode the results of a single draw: an element s G S 
answers, for each of the b bets, whether that bet wins or not. The sample space 
for our probability model is S d ; the j-th coordinate ojj reports the results of the b 
bets on the j'-tli draw. 
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It is easy to see that, in the notation of Proposition 5.1, 

b 

(13) (/n^n-nWiJcf]^. 

i 

Indeed, given an outcome uj G I PI W\ fl • • • fl W b , we can take, for i = 1 to b , 
Ji := {j | on draw j, bet i wins and nij = 1}. Since uj G /, the sets J±, ..., J b are 
mutually disjoint; and since w G Wi, \Ji\ > Hi. Hence, Cyl(Ji,w) C Wi, and thus 
w G [Wjjjj, for i = 1 to 6. 

Example 5.3. The left hand side of (13) can be a strict subset of the right hand 
side. For example, with 6 = 2 bets and d = 2 draws, suppose that w± = w 2 = 1 
and the gambler lays both bets on both draws. The outcome where both bets win 
on both draws is not in the left side of (13) but is in W\ □ W 2 . 

To write this example out fully, we think of the binary encoding, S = (0,1,2, 3} 
corresponding to {00,01,10,11}, so that, for example, 0 G S represents a draw 
where both bets lose, 1 G S represents the outcome 01 where the first bet loses and 
the second bet wins, 2 G S represents the outcome 10 where the first bet wins and 
the second bet loses, and 3 G S represents the outcome 11 where both bets win. 

The event / is the set of uj = (u>i,u) 2 ) for which no coordinate uij is equal to 3. 
The event W\ is the set of uj such that at least one of the coordinates is equal to 1 
or 3, and the event W 2 is the set of w such that at least one of the coordinates is 
equal to 2 or 3. Certainly, 

J Cl Wi Cl W 2 = {(1,2), (2,1)}, 

yet 

Wi □ W 2 = {(1,2), (2,1), (1,3), (2,3), (3,1), (3,2), (3,3)}. 


5c. Set theoretic considerations related to the BKR inequality. It is obvi¬ 
ous that, for events B ±,..., B r C S d and J C {1, • ■ ■ , d}, 


(14) 


Pl<i<r 



P| [Bi\j and 

l<i<r 


U1 <z<r 



D 


U I*]* 

l<i<r 


For unions, the containment may be strict, as in Example 5.2, where A U B = S d 
hence [4UB]| = S d , whereas [A\$ = [B\m = 0. 

Lemma 5.4 (Composition of cylinder operators). For A C S d andJ,K C {1, - - - ,d}, 

[[^4]j]ic = [A]jnK- 

Proof. Suppose first that u G [[ri]j]if. That is, Cyl (K,u) C Aj : if uj " G S d agrees 
with oj on K , then Cyl(J, oj") C A. We must show that w is in Aj n x] he., if uj" is 
in Cyl(J fl K,uj), then uj" is in A. 

Given uj" G Cyl( J C\K,uj), pick uj' to agree with w on K and uj" on S d \K. Then 
uj' agrees with uj" on (S d \ K) U ( J fl K ), so on J, i.e., uj" G A, proving C. 

We omit the proof of the containment 3, which is easier. □ 


Proposition 5.5. For Ai, A 2 ,..., Aj, C S d , we have: 

b 

□ Ai C (((• • • ((Ar □ A 2 ) □ A 3 ) ■ ■ ■ □ A b . 
1 
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Proof. By induction, using (11), it suffices to prove that 



With unions over K C {1,..., d} and pairwise disjoint J\, Jn ,.. 


(15) 

(16) 

(17) 

(18) 
(19) 


/b -1 


□ At J □ A b = (J 

, 1 / K 

= u 


’ 6—1 

□ a 

.i =1 


n[4 


b\K c 


K 


K 


u hw, 

JiJb-i i— 1 


n[A 


b\K c 


J K 


6-1 


s u u np.Uk 

K y i —1 

= u (( u n^w 

K ii=l 

U fW=[> 

Ji ,...,Jb i=l 1 


n [A b \ K c 
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The justifications are as follows. Line (15) is the definition, where K c denotes the 
complement of K. Line (16) follows by using the definition (12). The set inclusion 
in line (17) results from applying both parts of (14). Line (18) follows by applying 
Lemma 5.4 on the composition of cylinder operators. Line (19) is just re-labeling 
the indices: the previous line is a union, indexed by pairwise disjoint ,J\,..., J \,, 
and a set K ; for i = 1 to b — 1, Ki = Ji n K , and for index b , we take Kb = K c —the 
set of possible indices a = ( J\ fl K ,..., Jb-i H K,K C ) is identical to the set of 
a = {K l, ..., Kb), with i ^ j implies K, n Kj = 0—and then we switch notation 
back, from Kf s to Jf s. □ 


5d. Probability considerations related to the BKR inequality. References 
to the BKR inequality were given just after Equation (10). 

Theorem 5.6 (The original BKR inequality). Let S be a finite set, and let P be 
a probability measure on S d for which the d coordinates are mutually independent. 
(The coordinates might have different distributions.) For any events A,BC. S d , 
with the event AD B as defined by (12), 

P (AOB) <¥(A)F(B). 

Corollary 5.7. Under the hypotheses of Theorem 5.6, for b = 2,3,... and A \,..., Ab C 

S d , 

b 

(20) P(i4i □ A 2 □ • • • □ Aj,) < n p (^)- 

i= 1 

Proof. For 6 = 2, (20) is the original BKR inequality. For 6 > 3, we apply Propo¬ 
sition 5.5 to see that 

P(Ai □ • • ■ □ A b ) < P((((- • • ((Ai □ A 2 ) □ A 3 ) ■ ■ ■ □ A b ). 
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Applying the 6 = 2 case and induction provides the claim. □ 

We can now prove Proposition 5.1, which from our new perspective is a simple 
corollary of the extended BKR inequality, Corollary 5.7. 

Proof of Proposition 5.1. In view of the containment (13), we have: 

P(J n Wi n • • • n w b ) < p ^[] w}j , 


and by Corollary 5.7 



□ 


6. The optimization problem we actually solve 


In order to exploit the material in the previous section, we replace definition (6) 
of P with 

( probability of winning at least Wi times on bet i\ 
with m tickets, for all i, and no wins on dependent I ; 
bets / 

from Proposition 5.1, we know that then 

b 

(21) P(n;w,p) < Y\_D(ni;Wi,Pi). 


We will find a lower bound c-n* on the amount spent by a gambler who did not 
win dependent bets by solving not (7), but rather 


( 22 ) 


c-n* = min c-n s.t. 

ft 


b 

ni > Wi and n D(ni\Wi,pi ) > e. 

i=1 


We furthermore relax the requirement that the numbers of bets, the nf s, be integers 
and we extend the domain of D to include positive real values of ni as in [2, p. 945, 
26.5.24]: 


(23) D(n\w,p) = I p (w,n — w + 1), where I x (a,b) 


f 0 1 t a ~ 1 (i-t)'’- 1 dt 


is the regularized Beta function. The function I x , or at least its numerator and de¬ 
nominator, are available in many scientific computing packages, including Python’s 
SciPy library. Extending the domain of the optimization problem to non-integral 
ni can only decrease the lower bound c-n*, and it brings two benefits, which we 
now describe. 

In our examples, Y\D{wi,Wi,Pi) is much less than e, and consequently n* > Wi 
for some i. As D{n\w,p) is monotonically increasing in n , we have an equality 
Q D{n*\Wi,pi) = e. This is the first benefit, and it implies by (21) an inequality 
P{n*\w,p) < e. Therefore, as in §3, if all N people in the gambling population 
spent at least c-n* on tickets, the probability that one or more of the gamblers 
would win at least im times on bet i for all i is at most Ne. To say it differently: 
the solution c-n* to (22) is an underestimate of the minimum plausible spending 
required to win so many times. 
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The second benefit of extending the domain of the optimization problem is to 
make the problem convex instead of combinatorial. The convexity allows us to show 
that any local minimum (as found by the computer) attains the global minimal 
value. 

Proposition 6.1. A local minimizer n* for the optimization problem (22) (relaxed 
to include non-integer values ofni) attains the global minimal value. 

Proof. We shall show that the set of values of n over which we optimize, the feasible 
set, 

(24) {n G R 6 | Hi > Wi for all i } fi jn G R 6 | D(nf, Wi,pf) > ej , 

is convex. As the objective function c ■ n is linear in n, the claim follows. 

The first set in (24) defines a polytope, which is clearly convex. Because the 
intersection of two convex sets is convex, it remains to show that the second set is 
also convex. 

The logarithm is a monotonic function, so taking the log of both sides of an 
inequality preserves the inequality, and we may write the second set in (24) as: 

(25) |n S M h | y^.log D{nj-,Wi,pi) > logej . 

For 0 < x < 1 and a, (3 positive, the function 

(3 log I x (a, (3) 

is concave by [10, Cor. 4.6(iii)]. Hence log D(m; Wi,pf) is concave for Ui > wy. A 
sum of concave functions is concave, so the set (25) is a convex set, proving the 
claim. □ 

Example 6.2 (Louis Johnson 3). If we solve (22) for Louis Johnson’s wins— 
including not only his Pick 4 wins but also many of his prizes from scratcher games— 
we find a minimum amount spent of at least $2 million for e = 5 x 10 -14 . 

Monotonicity. Some of the gamblers we studied for the investigative report claimed 
prizes in more than 50 different lottery games. In such cases it is convenient to solve 
(22) for only a subset of the games to ease computation by reducing the number of 
variables. Since removing restrictions results in minimizing the same function over 
a set that strictly includes the original set, the resulting “relaxed” optimization 
problem still gives a lower bound for the gambler’s minimum amount spent. 

7. The man from Hollywood 

Louis Johnson’s astounding 252 prizes is beaten by a man from Hollywood, 
Florida, whom we refer to as “H.” During the same time period, H claimed 570 prizes, 
more than twice as many as Johnson did. Yet Mower’s news report [17] stimulated 
a law enforcement action against Johnson but not against H. Why? 

All but one of H’s prizes are in Play 4, which is really different from scratcher 
games: if you buy $100 worth of scratcher tickets for a single $1 game, this amounts 
to 100 (almost) independent Bernoulli trials, each of which is like playing a single 
$1 scratcher ticket. In Play 4, you can bet any multiple of $1 on a number to 
win a given drawing; if you win (which happens with probability p = 10~ 4 ), then 
you win 5000 times your bet. If you bet $100 on a single Play 4 draw, your odds 
of winning remain 10~ 4 , but your possible jackpot becomes $500,000, and if you 
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win, the Florida Lottery records this in the list of claimed prizes as if it were 100 
separate wins. Clearly, these are wins on dependent bets. 

So, to infer how much H had to spend on the lottery for his wins to be unsurpris¬ 
ing, first we have to estimate how much he bet on each drawing. Unfortunately, we 
cannot deduce this from the list of claimed prizes, because it includes the date the 
prize was claimed but not the specific drawing the ticket was for. (Louis Johnson’s 
Play 4 prizes were all claimed on distinct dates, so it is reasonable to assume they 
were bets on different draws.) The Palm Beach Post paid the Florida Lottery to 
retrieve a sample of H’s winning tickets from their archives. We think H’s winning 
plays were as in Table 1. 


date 

number played 

amount wagered 

12/6/2011 

6251 

$52 

?? 

???? 

$1 

11/11/2012 

4077 

$101 

12/31/2012 

1195 

$2 

2/4/2013 

1951 

$212 

3/4/2013 

1951 

$200 


Table 1. H’s Play 4 wins during 2011-2013 


To find a lower bound on the amount H spent by solving the optimization prob¬ 
lem (22), we imagine that he played several different Play 4 games, distinguished 
by their bet size. For simplicity, let us pretend that a player can bet Si, $50, S100, 
or $200, and suppose we observed H winning these bets 2, 1, 1, and 2 times, re¬ 
spectively. Using these as the parameters in (22) and the same probability cutoff 
e = 5 x 10~ 14 gives a minimum amount spent of just $96,354. 

But we can find a number tied more closely to H’s circumstances. In 2011-2013, 
he claimed $2.84 million in prizes. These are subject to income tax. If his tax 
rate was about 35%, he would have taken home about $1.85 million. If he spent 
that entire sum on Play 4 tickets, what is the probability that he would have won 
so much? We can find this by solving the following optimization problem with 
p = 10 -4 , w = (2,1,1, 2), and c = (1, 50,100, 200): 

4 

max] f D(m;wi,p ) s.t. Wi < rii and c-n< 1.85 x 10 6 . 

i =1 

The solution is about 0.0016, or one-in-625: it is plausible that H was just lucky. 
That’s because he made large, dependent bets, while we know from the examples 
above that betting a similar sum on smaller, independent bets is less likely to 
succeed. 

This illustrates a principle of casino gambling from [6, p. 170] or [16, #37]: bold 
play is better than cautious play. If you are willing to risk $100 betting red-black 
on a game of roulette, and you only care about doubling your money at the end of 
the evening, you are better off wagering $100 on one spin and then stopping, rather 
than placing 100 $1 bets. 
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8. The real world 

How did this paper come to be? One of us, Lawrence Mower, is an investigative 
reporter in Palm Beach, Florida. His job is to find interesting news stories and 
spend 4~6 months investigating them. He wondered whether something might be 
going on with the Florida Lottery, so he obtained the list of prizes and contacted 
the other three of us to help analyze the data. Below we describe some of the 
non-mathematical aspects. 

What some people get up to. Various schemes can result in someone claiming 
many prizes. 

Clerks at lottery retailers have been known to scratch the wax on a ticket lightly 
with a pin, revealing just enough of the barcode underneath to be able to scan it, 
as described in [15, paragraph 75]. If they scan it and it’s not a winner, they’ll sell 
it to a customer, who may not notice the very faint scratches on the card. Lottery 
operators in many states replaced the linear barcode with a 2-dimensional barcode 
to make this scam more difficult, but it still goes on: a California clerk was arrested 
for it on 9/25/14. 

Sometimes gamblers will ask a clerk to check whether a ticket is a winner. If it 
is, the clerk might say it’s a loser, or might say the ticket is worth less than it really 
is, then claim the prize at the lottery office—and become the recorded winner. Of 
course, most clerks are honest, but this scheme is popular; see, for example, [15, 
paragraphs 47, 48, 80, 146]. 

Another angle, ticket aggregation , goes as follows. A gambler who wins a prize 
of $600 or more may be reluctant to claim the prize at the lottery office. The office 
might be far away; the gambler might be an illegal alien; or the gambler might 
owe child support or back taxes, which the lottery is required to subtract from 
the winnings. In such cases, the gambler might sell the winning ticket to a third 
party, an aggregator , who claims the prize and is recorded to be the winner. The 
aggregator pays the gambler less than face value, to cover income tax (paid by the 
aggregator) and to provide the aggregator a profit. The market rate in Florida is 
$500-$600 for a $1000 ticket. 

Some criminals have acted as aggregators to launder money. They pay the 
gambler in cash, but the lottery pays them with a check, “clean” money because 
it is already in the banking system. Notorious Boston mobster Whitey Bulger [4] 
and Spanish politician Carlos Fabra [9] are alleged to have used this dodge. 

When questioned by Mower, some of our suspects confessed to aggregating tick¬ 
ets, which is a crime in Florida (Florida statute 24.101, paragraph 2). 

Outcomes in Florida. Before Mower’s story appeared, he interviewed Florida 
Lottery Secretary Cynthia O’Connell about these gamblers. She answered that 
they could be lucky: “That’s what the lottery is all about. You can buy one ticket 
and you become a millionaire” [17]. Our calculations show that for most of these 
10 gamblers, this is an implausible claim. O’Connell and the Florida Lottery have 
since announced reforms to curb the activities highlighted here [18]. They stopped 
lottery operations at more than 30 stores across the state and seized the lottery 
terminals at those stores. 

More news stories and outcomes in other states. Further stories about “too 
frequent” winners have now appeared in California (KCBS Los Angeles 10/30/14, 
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KPIX San Francisco 10/31/14), Georgia (Atlanta Fox 5 News 9/12/14, Atlanta 
Journal-Constitution 9/18/14), Indiana (ABC 6 Indianapolis, 2/19/15), Iowa (The 
Gazette, 1/23/15), Kentucky (WLKY, 11/20/14), Massachusetts (Boston Globe, 
7/20/14), Michigan (Lansing State Journal, 11/18/14), New Jersey (Asbury Park 
Press, 12/5/14 & 2/18/15; USA Today, 2/19/15), and Ohio (Dayton Daily News 
9/12/14). In Massachusetts, ticket aggregation is not illegal per se. In California, 
the lottery makes no effort to track frequent winners. 

In Georgia, ticket aggregation is illegal but the law had not been enforced. The 
practice was so widespread that elementary calculations (much simpler than those 
presented in this article) cast suspicion on 125 people. This gap in enforcement, in 
principle easy to detect, came to light as a consequence of the much more challenging 
investigation in Florida described here. This led to a change in policy announced by 
the Georgia Lottery Director, Debbie Alford, on 9/18/14: “We believe that most 
of these cases involved retailers agreeing to cash winning tickets on behalf of their 
customers — a violation of law, rules, and regulations.” 
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